News-Oriented Keyword Indexing with Maximum Entropy Principle

نویسندگان

  • Sujian Li
  • Houfeng Wang
  • Shiwen Yu
  • Chengsheng Xin
چکیده

In our information era, keywords are very useful to information retrieval, text clustering and so on. News is always a domain attracting a large amount of attention. Aiming at news documents' characteristics and the resources available, this paper proposes to use Maximum Entropy (ME) model to conduct automatic keyword indexing. The focus of ME-based keyword indexing is how to obtain all the candidate items and select useful features for ME model. First, we make use of some relatively mature linguistic techniques and tools to obtain all the possible candidate items. Then, a feature set of ME model will be introduced. At last we test the model, and experimental results are given.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword Spotting Using Durational Entropy

This paper deals with the task of detection of a given keyword in continuous speech. We build upon a previously proposed algorithm where a modified Viterbi search algorithm is used to detect keywords, without requiring any explicit garbage or filler models. In this work, the concept of durational entropy is used to further discard a large fraction of false alarm errors. Durational entropy is de...

متن کامل

News-Oriented Automatic Chinese Keyword Indexing

In our information era, keywords are very useful to information retrieval, text clustering and so on. News is always a domain attracting a large amount of attention. However, the majority of news articles come without keywords, and indexing them manually costs highly. Aiming at news articles’ characteristics and the resources available, this paper introduces a simple procedure to index keywords...

متن کامل

An Information-Theoretic Framework for Semantic-Multimedia Indexing

To solve the problem of indexing collections with diverse text documents, image documents, or documents with both text and images, one needs to develop a model that supports heterogeneous types of documents. In this paper, we show how information theory supplies us with the tools necessary to develop a unique model for text, image, and text/image retrieval. In our approach, for each possible qu...

متن کامل

Work-in-Progress: Automated Named Entity Extraction for Tracking Censorship of Current Events

Tracking Internet censorship is challenging because what content the censors target can change daily, even hourly, with current events. The process must be automated because of the large amount of data that needs to be processed. Our focus in this paper is on automated probing of keyword-based Internet censorship, where natural language processing techniques are used to generate keywords to pro...

متن کامل

A statistical framework for fusing mid-level perceptual features in news story segmentation

News story segmentation is essential for video indexing, summarization and intelligence exploitation. In this paper, we present a general statistical framework, called exponential model or maximum entropy model, that can systematically select the most significant mid-level features of various types (visual, audio, and semantic) and learn the optimal ways in fusing their combinations in story se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003